Eecient Barriers for Distributed Shared Memory Computers University of Colorado at Boulder Eecient Barriers for Distributed Shared Memory Computers

نویسندگان

  • Dirk Grunwald
  • Suvas Vajracharya
چکیده

Barrier algorithms are central to the performance of numerous algorithms on scalable, high-performance architectures. Numerous barrier algorithms have been suggested and studied for Non-Uniform Memory Access (NUMA) architectures, but less work has been done for Cache Only Memory Access (COMA) or attraction memory 2] architectures such as the KSR-1. In this paper, we present two new barrier algorithms that ooer the best performance we have recorded on the KSR-1 distributed cache multiprocessor. We discuss the trade-oos and the performance of seven algorithms on two architectures. The new barrier algorithms adapt well to a hierarchical caching memory model and take advantage of parallel communication ooered by most multiprocessor interconnection networks. Performance results are shown for a 256-processor KSR-1 and a 20-processor Sequent Symmetry.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Sparse Matrix by Vector Multiplication using a Shared Virtual Memory Environment

Many iterative schemes in scientiic applications require the multiplication of a sparse matrix by a vector. This kernel has been mainly studied on vector processors and shared-memory parallel computers. In this paper, we address the implementation issues when using a shared virtual memory system on a distributed memory parallel computer. We study in details the impact of loop distribution schem...

متن کامل

S a I S T Bo S T O N Communicable Memory and Lazy Barriers for Bulk Synchronous Parallelism in Bspk

Communication and synchronization stand as the dual bottlenecks in the performance of parallel systems, and especially those that attempt to alleviate the programming burden by incurring overhead in these two domains. We formulate the notions of communicable memory and lazy barriers to help achieve eecient communication and synchronization. These concepts are developed in the context of BSPk, a...

متن کامل

An Algebraic Multilevel Parallelizable Preconditioner for Large-Scale CFD Problems

An eecient parallelizable preconditioner for solving large-scale CFD problems is presented. It is adapted to coarse-grain parallelism and can be used for both shared and distributed-memory parallel computers. The proposed preconditioner consists of two independent approximations of the system matrix. The rst one is a block-diagonal, fully paralleliz-able approximation of the given system. The s...

متن کامل

Fast Message Passing via the ALLCACHE Memory on KSR Computers

A large body of applications have been built which use a message-passing style of inter-process communication. Thus, it is important to be able to support eecient message-passing even on shared-memory computers. Unfortunately, `direct' porting of message-passing packages to shared-memory computers invariably produces unacceptably poor performance. In this paper, we discuss schemes for eeciently...

متن کامل

Deriving Eecient Cache Coherence Protocols through Reenement ?

We address the problem of developing eecient cache coherence protocols implementing distributed shared memory (DSM) using message passing. A serious drawback of traditional approaches to this problem is that designers are required to state the desired coherence protocol at the level of asynchronous message interactions. We propose a method in which designers express the desired protocol at a hi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1994